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Abstract: In this paper we provide some general convergence results for adaptive 
designs for treatment comparison, both in the absence and presence of covariates. In 
particular, we demonstrate the almost sure convergence of the treatment allocation pro- 
portion for a vast class of adaptive procedures, also including designs that have not been 
formally investigated but mainly explored through simulations, such as Atkinson's op- 
timum biased coin design, Pocock and Simon's minimization method and some of its 
generalizations. Even if the large majority of the proposals in the literature rely on con- 
tinuous allocation rules, our results allow to prove via a unique mathematical frame- 
work the convergence of adaptive allocation methods based on both continuous and 
discontinuous randomization functions. Although several examples of earlier works 
are included in order to enhance the applicability, our approach provides substantial 
insight for future suggestions, especially in the absence of a prefixed target and for 
designs characterized by sequences of allocation rales. 

Keywords and phrases: Biased Coin Design, CARA Procedures, Minimization meth- 
ods, Response-Adaptive Designs, Sequential Allocation.. 



1. Introduction 

The past five decades have witnessed a sizeable amount of statistical research on adap- 
tive designs in the context of clinical trials for treatment comparison. These are sequen- 
tial procedures where at each step the accrued information is used to make decisions 
about the way of randomizing the allocation of the next subject. 

Starting from the pioneering work of Efron's Biased Coin Design (BCD) [9], sev- 
eral authors have suggested adaptive procedures that, by taking into account at each 
step only previous assignments, are aimed at achieving balance between two available 
treatments (see e.g. [3, 24, 25, 26, 29]). We shall refer to these as Assignment- Adaptive 
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methods. Since clinical trials usually involve additional information on the experimen- 
tal units, expressed by a set of important covariates/prognostic factors, Pocock and Si- 
mon [ 19] and other authors (see for instance [1,5,8, 27]) proposed Covariate-Adaptive 
designs. These methods modify the allocation probabilities at each step according to 
the assignments and the characteristics of previous statistical units, as well as those of 
the present subject, in order to ensure balance between the treatment groups among 
covariates for reducing possible sources of heterogeneity. 

Motivated by ethical demands, another different viewpoint is the Response- Adaptive 
randomization methods. These are allocation rules introduced with the aim of skewing 
the assignments towards the treatment that appears to be superior at each step (see e.g. 
[2]) or, more in general, of converging to a desired target allocation of the treatments 
which combines inferential and ethical concerns [4, 28]. The above mentioned frame- 
work has been recently extended in order to incorporate covariates, which has led to 
the introduction of the so-called Covariate-Adjusted Response-Adaptive (CARA) pro- 
cedures, i.e. allocation methods that sequentially modify the treatment assignments on 
the basis of earlier responses and allocations, past covariate profiles and the character- 
istics of the subject under consideration. See [22, 31] and the cornerstone book by Hu 
and Rosenberger [ 15]. 

In general, given a desired target it is possible to adopt different procedures converg- 
ing to it, such as the Sequential Maximum Likelihood design [18], the Doubly-adaptive 
BCD [10, 16] and their extensions with covariates given by Zhang et al.'s CARA de- 
sign [31] and the Covariate-adjusted Doubly-adaptive BCD [30], having well estab- 
lished asymptotic properties. However, in the absence of a given target one of the main 
problems lies in providing the asymptotic behaviour of the suggested procedure. This 
is especially true in the presence of covariates, where theoretical results seem to be few 
and the properties of the suggested procedures have been explored extensively through 
simulations; indeed, as stated by Rosenberger and Sverdlov [21] "very little theoretical 
work has been done in this area, despite the proliferation of papers". For instance, even 
if Pocock and Simon's minimization method is widely used in the clinical practice, 
its theoretical properties are still largely unknown (indeed, Hu and Hu's results [14] 
do not apply to this procedure), as well as the properties of several extensions of the 
minimization method and of Atkinson's Biased Coin Design [1]. 

Moreover, although the large majority of the proposals are based on continuous and 
prefixed allocation rules, updated step by step on the basis of the current allocation pro- 
portion and some estimates of the unknown parameters (usually based on the sufficient 
statistics of the model), the recent literature tends to concentrate on discontinuous ran- 
domization functions, such as the Efficient Randomized-Adaptive Design (ERADE) 
[17], because of their low variability. 

In this paper we provide some general convergence results for adaptive allocation 
procedures both in the absence and presence of covariates, continuous or categorical. 
By combining the concept of downcrossing (originally introduced in [ 13]) and stopping 
times of stochastic processes we demonstrate the almost sure convergence of the treat- 
ment allocation proportion for a large class of adaptive procedures, even in the absence 
of a given target, and thus our approach provides substantial insight for future sugges- 
tions as well as for several existing procedures that have not been theoretically explored 
[12, 23]. In particular, we prove that Pocock and Simon's minimization method [19] 
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is asymptotically balanced, both marginally and jointly, showing also the convergence 
to balance of Atkinson's BCD [1]. The suggested approach allow to prove through 
a unique mathematical framework the convergence of continuous and discontinuous 
randomization functions (like e.g. the Doubly-Adaptive Weighted Differences design 
[11], the Reinforced Doubly-adaptive BCD [6], ERADE [17] and Hu and Hu's proce- 
dure [14]), taking also into account designs based on Markov chain structures, such as 
the Adjustable BCD [3] and the Covariate-adaptive BCD [5], that can be characterized 
by sequences of allocation rules. Moreover, by removing some unessential conditions 
usually assumed in the literature, our results allow to provide suitable extensions of 
several existing procedures. 

The paper is structured as follows. Even if Assignment-Adaptive and Response- 
Adaptive procedures can be regarded as special cases of CARA designs, we will treat 
them separately for the sake of clarity, whereas Covariate- Adaptive methods will be 
discussed as particular case of CARA rules. Starting from the notation in Section 2, 
Sections 3 deals with Assignment- Adaptive designs, while Section 4 discusses Response- 
Adaptive procedures. Sections 5 and 6 illustrate the asymptotic behaviour of CARA 
methods in the case of continuous and categorical covariates, respectively. 

2. Notation 

Suppose that patients come to the trial sequentially and are assigned to one of two treat- 
ments, A and B, that we want to compare. At each step i > 1, a subject will be assigned 
to one of the treatments and a response Yi will be observed. Typically, the outcome Yi 
will depend on the treatment, but it may also depend on some characteristics of the 
subject expressed by a vector Z,- t of covariates/concomitant variables. We assume that 
{Zi}i>i are i.i.d. covariates that are not under the experimenters' control, but they can 
be measured before assigning a treatment, and, conditionally on the treatments and the 
covariates (if present), patients' responses are assumed to be independent. Let Si de- 
note the ith allocation, with Si = 1 if the ith subject is assigned to A and otherwise; 
also, N n = Y^i=i $i i s m e number of allocations to A after n assignments and 7r„ the 
corresponding proportion, i.e. ir n = n _1 iV„. 

In general, adaptive allocation procedures can be divided in four different categories 
according to the experimental information used for allocating the patients to the treat- 
ments. Suppose that the (n + l)st subject is ready to be randomized; if the probability 
of assigning treatment A depends on: 

i) the past allocations, i.e. Pr((5 n+ i = 1 | Si,... ,S n ), we call such a procedure 
Assignment-Adaptive (AA); 

ii) earlier allocations and responses, i.e. Pr((5 n+ i = 1 | S±, . . . , S n ; Y\, . . . , Y n ), 
then the design is Response-Adaptive (RA); 

iii) the previous allocations and covariates, as well as the covariate of the present 
subject, i.e. Pr(<5„ + i = 1 | Si, S n ; Z\, Z n , Z n+ i), the procedure is 
Covariate-Adaptive (CA); 

iv) the assignments, the outcomes and the covariates of the previous statistical units, 
as well as the characteristic of the current subject that will be randomized, i.e. 
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Pr((5 n+ i = 1 | Si, . . . , S n ;Yi, . . . , Y n ; Zi, . . . , Z n+1 ), then the rule is called 
Covariate-Adjusted Response-Adaptive (CARA). 

From now on we will denote with 2f n the a— algebra representing the natural history 
of the experiment up to step n associated with a given procedure belonging to each 
category (with So the trivial a— field). For instance, in the case of AA rules, 3? n = 
a {Si, . . . , S n }, whereas for RA designs 9f n = a {Si, . . . , S n ; Yi, . . . , Y n }. Even if 
the large majority of suggested procedures assume continuous allocation rules, in this 
paper we take also into account designs with discontinuous randomization functions, 
provided that their set of discontinuities is nowhere dense. 

3. Assignment-Adaptive Designs 

In this section we shall deal with AA procedures such that 



where ip AA : [0; 1] [0; 1]. 

Definition 3.1. Let ip : [0; 1] — > [0; 1], a point t g [0; 1] is called a downcrossing of 



Note that if the allocation function ip(x) is decreasing, then there exists a single 
downcrossing t G (0; 1) and if the equation ip(x) = x admits a solution then the 
downcrossing coincides with it. Clearly, if ijj(-) is a continuous and decreasing function, 
then t can be found directly by solving the equation ip(x) = x. 

Theorem 3.1. If the allocation function <p AA (-) in (3.1) has a unique downcrossing 
t G (0; 1), then lim n _*.oo 7r n = t a.s. 

Proof Let AM, = - E(Si\%^i)}, where 3„ = a{8 u . . . , S n }. Then {AMf,i > 
1} is a sequence of bounded martingale differences with \ AMi | < 1 for any i > 1; thus 
the sequence {M n = X)"=i \ } is a martingale with 5Zfe=i ^[(^^^i) 2 |^fe-i] — 
n, so that as n tends to infinity n~ 1 M n — > a.s. Let l„ = max {s : 1 < s < n, ir s < t}, 
with max0 = 0, then at each step i > l n we have ip AA (wi) < t. Note that 



Vi{S n+ i = 1 | 9f„) = ^ AA (ir n ), for n > 1, 



(3.1) 



V.x < ip(x) > t and Mx > t, ^(x) < t. 



n n 



N n =N ln+1 + AM k+ Yl E(S k \^s k -i) 



k=l n +2 k=l n +2 



n 



<Ni n + 1 + M n - Ml n+ 1 + Yj ^ fa-O 



k=l n +2 



n 



<N ln + 1 + M, 



n 



k=l n +2 



and, since Ni n < l n t, then 



N n - nt < M n - M in+ i + 1 - 1 . 



(3.2) 
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As n — > oo, then l n —> oo or sup n l n < oo, and in either case the r.h.s. of (3.2) goes 
to a.s. Thus [n n — t] + — > a.s. and, analogously, [(1 — w n ) — (1 — t)] + —> a.s. 
Therefore lirrin^oo ir n = t a.s. □ 

Example 3.1. The completely randomized (CR) design is definedby letting Pr((5n+i 
I | Q„) = 1/2 for every n. This corresponds to assume tp (x) = 1/2 for all 
x G [0; 1], which is continuous and does not depend on x; therefore ip CR (-) has a 
single downcrossing t = 1/2 and thus ir n — > 1/2 a.s. as n — S> oo. 

Example 3.2. £/ro« 'i BCD [9] is defined by 

if D n < 0, 

if D n = 0, forn>l, 
if D n > 0, 

where D n = 2N n — n is the difference between the allocations to A and B after n steps 
and p G [1/2; 1] is the bias parameter. Since sgnD n = sgn(ir n — 1/2), then Efron's 
rule corresponds to 

(p, if x<l/2, 

<p E (x) = I 1/2, if x = 1/2, (3.3) 
[l-p, if x>l/2, 

which has a single downcrossing t = 1/2, since ip E (l/2) = 1/2, and therefore 
lim n _ i>00 7r„ = 1/2 a.s. Clearly, Theorem 3.1 allows to provide suitable extensions 
ofEfron 's coin converging to any given desired target t* 6 (0; 1), namely 

f p2, if x<t*, 
if E {x) = { t*, if x = t*, (3.4) 
,Pi, if x>t*, 

where < p± < t* < pi < 1 and at least one of these inequalities must hold strictly. 
Remark 3.1. Note that from Theorem 3.1: 

i) the continuity of the allocation rule is not required and therefore it is possible to 
consider discontinuous randomization functions like, e.g., (3.3) and (3.4); 

ii) for the convergence to a given desired target t* , condition ip AA (t*) = t* is not 
requested; moreover, structures of symmetry of the allocation function are not 
needed (e.g., in (3.4) condition pi = 1 — p\ is not required), even if they are 
typically assumed in order to treat A and B in the same way. 

For instance, from Theorem 3.1 the following AA procedure 

^•(*) = / 1 ' (/ X " 1/2 ' 
[1/2, if x>l/2, 

is asymptotically balanced, i.e. ir n — > 1/2 a.s. as n tends to infinity. 
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Corollary 3.1. Suppose that ip AA is a composite function such that tp AA (x) = hi [h,2 (x)], 
where hi : D C R — »• [0; 1] is decreasing and h,2 '■ [0; 1] — > D is continuous and in- 
creasing. If d G D is such that h\{d) = h% (d), then lim ra _ > . o0 7r„ = h^id) a.s. 

Proof. The proof follows easily from Theorem 3.1. Indeed, ip AA (-) is a decreasing 
function with <p AA Ih^ (d)] = h\(d) = h^ (d) and therefore ip AA (-) has a single 
downcrossing in h% (d). □ 

Example 3.3. Wei [29] defined his Adaptive BCD by letting 

Pr (5 n+ i = 1 | 3 n ) = f (27r„ - 1) , /orn>l, 

where f : [— 1; 1] —5- [0; 1] is a continuous and decreasing function s.t. f(—x) = 1— f(x). 
Sef = 2t« — 1 : [0; 1] — > [—1; 1], Wei's allocation function is ip w (x) = f [<?(a;)]. 
Since g' 1 {w) = (w+l)/2 for all w £ [0; 1], theng- l {{)) = 1/2 = f(0), i.e. 1/2 is the 
only downcrossing of (p w (■). Therefore, from Corollary 3.1 it follows that ir n — > 1/2 
a.s. flj n -> oo. 

Remark 3.2. Atofe ?/zaf Theorem 3.1 still holds even if we assume different randomiza- 
tion functions at each step by letting Pr(<5„+i = 1 | Sj n ) = ip AA [it n ), provided that 
t £ (0; 1) is the unique downcrossing of ip AA (■) for every n > 1. 

Example 3.4. T/ie Adjustable Biased Coin Design (ABCD) proposed by Baldi An- 
tognini and Giovagnoli (2004) is defined as follows. Let F(-) : R — > [0]1] be a de- 
creasing function such that F(—x) = 1 — F(x), the ABCD assigns the (n + l)st 
subject to treatment A with probability Pr(<5„ + i = 1 | S„) = F(D n ), for n > 1. This 
corresponds to let 

<p ABCD (x)=F[n(2x-l)], n>l, 

and, from the properties of F(-), at each step n the function f ABCD (-) is decreasing 
with ip ABCD (1/2) = 1/2. Thus t = 1/2 is the only downcrossing of ip ABCD (■) for 
every n, so that limn-,.^ 7r„ = 1/2 a.s. 



4. Response-Adaptive designs 

RA rules were originally introduced as a possible solution to local optimality problems 
in a parametric setup, where there exists a desired target allocation of the treatments 
which depends on the unknown model parameters [20]. Recently, they have been also 
suggested for ethical purposes, with the aim of skewing at each step the allocations 
towards the treatment that appears to be superior (see e.g. [11]). 

Suppose that the probability law of the responses under treatments A and B depends 
on a vector of unknown parameters ~f A and j B , respectively, with 7* = (7^4*, 7#') € 
fi, where fl is an open convex subset of M. k . Starting with m observations on each 
treatment, usually assigned by using restricted randomization, an initial non-trivial pa- 
rameter estimation 7 2m is derived. Then, at each step n > 2m let 7 n be the estimator 
of the parameter 7 based on the first n observations, which is assumed to be consistent 
in the i.i.d. case. In this section we shall deal with RA procedures such that 

Pr(<5„+i = 1 I 3„) = <p RA (7r„ ; 7J , for n > 2m. (4.1) 
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The following definition will help illustrate the asymptotic behaviour of RA rules and 
also CARA designs with continuous covariates treated in Section 5. 

Definition 4.1. Let ip(x; y) : [0; l]xE^ [0; 1]. The function t(y) : R d [0; 1] is 
called a generalized downcrossing ofip if for any given y G R d we have 

Vx < t(y), ip(x; y) > t(y) and \/x > t(y), ^{x; y) < t(y). 

If the function ip(x,y) is decreasing in x, then the generalized downcrossing t(y) is 
unique and t(y) ^ {0; 1} for any y S R d . Moreover, if there exists a solution of the 
equation ^{x, y) = x, then t(y) coincides with this solution. 

Theorem 4.1. Suppose that at each step n the allocation rule ip RA (ir n ;7„) is de- 
creasing in 7r„. If the only generalized downcrossing t(~f n ) is a continuous function, 
then lim„_j. 0O ir n = £(7) a.s. 

Proof. See Appendix A. 1 . □ 

Example 4.1. Geraldes et al. (2006) introduced the Doubly Adaptive Weighted Dif- 
ferences Design (DAWD) for binary response trials. Let 7 = (pajPbY be the vector 
of the probabilities of success of A and B and 7 n = (pAmPBnY the corresponding 
estimate after n steps. When the (n + l)st patient is ready to be randomized, the DAWD 
allocates him/her to treatment A with probability 

Pr(<5, 1+ i = 1 I 3„) = pgi{pAn -PBn) + (1 -p)9i (27r„ - 1) , for n > 2m, (4.2) 

where p S [0; 1) represents an "ethical weight" and <?i,<?2 : [— 1,1] [0,1] are 
continuous functions s.t. 

i) ffi(0) - 92(0) = 1/2 and gi (l) = g 2 {-\) = 1; 
H) 9i{~ x ) = 1 — 9i( x ) and g2(—x) = 1 — gz(x) Va; e [—1; 1]; 
Hi) gi(-) is non decreasing and gi(-) is decreasing. 

Regarded as a function of ir n and y n , rule (4.2) corresponds to 

V DAW ° K 5 7n) = -l)7n) + (1 - P)92 (2tt„ - 1) , 

which is decreasing in ir n , so that the equation ip DAWD (jr n ; = 7r n has a unique 
solution t('y n ), i.e. the generalized downcrossing, which is continuous in ~y n (see [11 ]). 
Thus liirin^oo 7r„ = t(-f) a.s. 

Often there is a desired target allocation tt* to treatment A that depends on the 
unknown model parameters, i.e. tt* = tt*(j), where tt* : SI — s> (0; 1) is a mapping 
that transforms a fc-dim vector of parameters into a scalar one. Thus, Theorem 4.1 still 
holds even if, instead of (4.1), we assume 

Pr(S n+1 = 1 I 3r„) = Cp RA (tt„ ; 7r*(7„)) , for n > 2m, 

provided that tt* (•) is a continuous function. In this case the generalized downcrossing 
could be more properly denoted by t(7 n ) = ^( 7r *(7 n ))- 
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Example 4.2. The Doubly-adaptive Biased Coin Design (DBCD) [10, 16] is one of 
the most effective families of RA procedures aimed at converging to a desired target 
7r*(7) G (0,1) that is a continuous function of the model parameters. The DBCD 
assigns treatment A to the (n + l)st subject with probability 

Pr(5„ +1 = 1 | 9f„) = <p DBCD (n n ] **(%)), for n > 2m, (4.3) 
where the allocation function (p needs to satisfy the following conditions: 

i) ip DBCD (x; y) is continuous on (0; l) 2 ; 

ii) <p DBCD (x;x)=x; 

Hi) if DBCD (x; y) is decreasing in x and increasing in y; 

iv) <p DBCD (x; y) = l- <p DBCD (l -x;l- y) for all x, y E (0; l) 2 . 

The DBCD forces the allocation proportion to the target since from conditions ii) and 
Hi), when x > y then cp DB (x,y) < y, whereas if x < y, then ip DBCD (x,y) > 
y. However, condition i) is quite restrictive since it does not include several widely- 
known proposals based on discontinuous allocation functions, such as Efron's BCD 
and its extensions [17], while condition iv) simply guarantees that A and B are treated 
symmetrically. 

Since CfP B (x; y) is decreasing in x with ip DBCD (x; x) = x, then the generalized 
downcrossing is unique, given by t(TT*(fj n )) = 7r*(7„). Thus, from the continuity of 
the target tt* (■) it follows that limn-^ ir n = 7r* (7) a.s. 

Example 4.3. In the same spirit of Efron's BCD, Hu, Zhang and He (2009) have re- 
cently introduced the ERADE, which is a class of RA procedures based on discontin- 
uous randomization functions. Let again 7r*(7) S (0, 1) be the desired target, that is 
assumed to be a continuous function of the unknown model parameters, the ERADE 
assigns treatment A to the (n + l)st patient with probability 

!0^*(l n ), ifn n >7T*(7„), 

7T*(7n) ; ifKn=**{%), (4.4) 

l-a(l-7r*(7„)), ifir n < tt*(%), 

where a € [0; 1) governs the degree of randomness. Clearly, rule (4.4) corresponds to 

{ay, ifx > y, 

V, ifx = y, 

l-a(l-y), ifx<y, 

which has a single generalized downcrossing t(y) = y; therefore lim„_ i . 00 7r„ = 7r*(7) 
a.s. 

Remark 4.1. Contrary to the DBCD in (4.3) and the ERADE in (4.4), from Theorem 
4.1 conditions ip RA {x]x) = x and <p RA (x;y) = 1 — (p RA (l — x; 1 — y) are not 
requested for guaranteeing the convergence to the chosen target ir*(-y). For instance, 
if we let 

,RA ( „ ^ _ J^*(7„) T , if 7r n >7r*(7„), 

^*(7„) 1/r , if 7r„<7r*(7j, 



<p« A (7r„;^(7j) 
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where the parameter r > 1 controls the degree of randomness, then 7r„ — > 7r*(7) a.s. 
as n — > oo. 

5. CARA designs with continuous covariates 

Starting from the pioneering work of Rosenberger et al. [22], there has been a grow- 
ing statistical interest in the topic of CARA randomization procedures. These designs 
change the probabilities of allocating treatments by taking into account all the available 
information with the aim of skewing the allocations towards the superior treatment or, 
in general, of converging to a desired target allocation depending on the covariates. 
From now on we deal with CARA designs such that 

Pr(<J„+i = 1 | 3f„,Z„+i = z n+1 ) = ip GARA (ir n ;j n ,S n J(z n+1 )) , n > 2m, 

(5.1) 

where S„ = a(6i, . . . , 8 n ; Yi, . . . , Y n ; Z\, , . , , Z n ), j n depends on earlier alloca- 
tions, covariates and responses, while S„ = S(6\, . . . , 5 n ; Z\, . . . , z n ) is a function of 
the allocations and the covariates of the previous patients. In general, it is a vector of 
sufficient statistics of the covariate distribution that incorporates the information on the 
covariates in the treatment groups after n steps and from now on we always assume 
that, as n — > oo, 

S n = S(8i,...,6 n ;Z 1 ,...,Z n )-X; a.s. (5.2) 

Often, S n contains the moments of the covariates within the two groups; thus, in order 
to guarantee (5.2), it is sufficient that the number of assignments to A and B goes to 
infinity, i.e. lim„_ i . 00 N n = oo and lim n _ ! . 00 (n — N n ) = oo a.s., since the covariates 
are assumed i.i.d. 

Theorem 5.1. At each step n, suppose that the allocation function ip CARA [ n (5.1) is 
decreasing in 7r„ and let 

£z (n n ;j n ,S n ) = E Zn+1 [ip CARA (tt„ ; j n , S n , f{Z n+ i))} • 

If the only generalized downcrossing tz(~f n , S n ) of (pz is jointly continuous, then 

lim 7r„ = lz{l, <j) as. (5.3) 

Proof. See Appendix A.2. □ 

Example 5.1. Consider the linear homoscedastic model with treatment/ covariate in- 
teractions in the following form 

E(Yi) =5iHA + (l- Si) hb + Zi [6,13a + (1 - 6i)0 B ] , i > 1, 

where [ia and /i b are the baseline treatment effects, Zi is a scalar covariate observed 
on the ith individual and 0a, Pb are possibly different regression parameters. Under 
this model, adopting the "the-larger-the-better" scenario, treatment A is the best for 
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patient (n + 1) if Ha + -Zn+i/^A > Ms + Zn+\$B', thus, if only ethical aims are taken 
into account it could be reasonable to consider the following allocation rule: 

<P ETH (n n ;7 n ,S n ,f(z n+1 )) = 1{ Aa „- Ab „ +2 „ +1 (,9 /1 „- ( 3 b „)>o}, 

where l^.j is the indicator function and 7 n = (jjLAm A-Bni &Ani Pbu) is feasf 
square estimator of j = (ha, Hb, Pa, PbY after n steps. Thus, 

E Zn+1 [<p ETH (7r n ;%,S n J(Z n+1 ))} = 

f A-Bri — A f Ar; 



= 1-G 



z 



Bn 



(5.4) 



where Gz( m ) is the cdfofZ. Note that (5.4) is constant in ir n , so it has a single gener- 
alized downcrossing tz(~f n , S n ) = 1 — Gz I g B "~g A " I and therefore, from Theorem 
5.1, 

( Hb - HA " 



lim 7T n = 1 — G 



z 



Example 5.2. As in the case ofRA procedures, also for CARA rules there is often a 
desired target allocation 7r* to treatment A that is a function of the unknown model pa- 
rameters and the covariates, i.e. ir* = 7r*(7, z), which is assumed to be continuous in 
7 for any fixed covariate level z. In particular, Zhang et al. [31 ] assumed a generalized 
linear model setup and suggested to allocate subject (n + 1) to A with probability 

Pr(£„ + i = 1 | 9l n , Z n+ i = z n+ i) = Tr(j n ,z n+1 ), forn>2m, (5.5) 

which represents an analog of the Sequential Maximum Likelihood design [18] in the 
presence of covariates. Assuming that the target function 7r* is differentiable in 7, un- 
der the expectation, with bounded derivatives, the authors showed that Iim n _ ) . 00 7r„ = 
Ezfr^Z)] a.s. 

Clearly, allocation rule (5.5) is constant in n n and therefore (p z (tt„ ;7„, S n ) = 
Ez n+1 [ 7r (7,u Zn+i)] is also constant in tt„. Thus, the generalized downcrossing of 
(pz is unique and obviously lim n _>oo K n = Ez [^(7, Z)] a.s. 

Remark 5.1. Some authors (see for instance [7]) suggested CARA designs that incor- 
porate covariate information in the randomization process, but ignoring the covariate 
of the current subject. Note that these methods can be regarded as special cases of 
yCARA - m (5 7j and therefore Theorem 5.1 can still be applied by taking into account 
the generalized downcrossing of <p CARA directly. 

Even if Theorem 5.1 proves the convergence of CARA designs in the case of con- 
tinuous covariates, it could be difficult to obtain an analytical expression for (pz and 
therefore to find the corresponding generalized downcrossing. Nevertheless, the fol- 
lowing Lemma allows to obtain the generalized downcrossing in a simple manner in 
some circumstances. 
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Lemma 5.1. Let ip CARA (jr n ■ S n , f(z n+ \)) be jointly continuous and, assuming 
that tjp ARA [x ; 7, f(Z)) is decreasing in x, let t* z (~y, <j) be the unique solution of 
equation 



^ AHA (x ;1 ^,E z [f(Z)}) 



x. 



If ^CARA (t* z (pf, t;) ; 7, f{Zj) is linear in f(Z) and t* z is jointly continuous, then 
(5.3) still holds with tz(7, <?) = t* z (-f, <;). 

Proof. Assume that tz(~f, ^) < t* z (~f, From the properties of ip CARA , the function 
<Pz (x', , y, ( i) is jointly continuous and decreasing in x, so that tz(~f, s) = fiz (*z(7,^);7, 
ftz (t* z (j, ; 7 ? However, 

(t* z h, ; 7, = v CARA (t%h, c) ; 7, ^z[/(Z)]) = 4(7, 0, 

since <p CARA (*z(7, ^) !7j^, f{%)) is linear in f(Z), contradicting the assumption. 
Analogously if we assume tz(7, s) > ^(7, ">)■ □ 

Example 5.3. The Covariate-adjusted Doubly-adaptive Biased Coin Design intro- 
duced by Zhang and Hu (2009) is a class of CARA procedures intended to converge 
to a desired target 7r*(7, z). When the (n + l)st subject with covariate Z n+ i = Z n +\ 
is ready to be randomized, he/she will be assigned to A with probability 

Pr(<5 n +1 =1 | 3n, Z n+ i = Z n +l) = 

K*(%,z n+1 ) ( 5 . 6 ) 
7T*(7 n) ^ +1 ) (ffc)" + [1 -7r*(7 n ,z n+1 )] (tE^)*" 

where p n = rtT 1 Y]7—i 7r *(7m z i)- Assuming that 

Pr(5„+i = 1 | 3 n , Z n+1 =z)-¥ tt*(7, z) a.s. (5.7) 

the authors proved that limn^oo 7r„ = Ez [tt*(7, ^)] 

Afofe f/za? ra/e (5.6) can fee regarded as special case ofip CARA after the transforma- 
tion (7 , S^, /(z n+ i)) 1 — ^ (p n , 7r*(7„, z„ + i)) anc/ thus, even if we remove condition 
(5.7), Lemma 5.1 can be applied to the allocation function 



Cp ZH ( X ;a,b) = \l + ^—^ 



(1 — a)x 



a(l — x) 

which is decreasing in x and continuous in all the arguments. Indeed, since both 
p n and Ez n+1 [tr*(~/ n , Z n+ i)] converge to Ez [tt*(7, Z)] a.s., the solution of the 
equation ip ZH (x;E z [tt*(i, Z)\ ,E z [77* (7, Z)]) = x is t* z = E z Z)]. Fur- 

thermore, since ip ZH (E z [vr*(7, Z)\ ; E z [n*(-y, Z)] , Tr*(j, Z)) = n*(~f, Z), then 
7Tn = E z [tt*(7, Z)\ a.s. 
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5.7. Covariate-Adaptive designs with continuous covariates 

Theorem 5. 1 and Lemma 5.1 can be naturally applied to CA designs in the presence of 
continuous covariates by considering, instead of (5.1), the following class of allocation 
rules: 

Pr((5 n+ i = 1 | 3„, Z n+ i = z n +i) = <p CA (tt„ ; S n , f{z n+1 )) , 

with 3„ = a(8i, ...,S n ; Z 1 , . . . , Z n ). Clearly, tz(j, s) and t z (j, s) should be re- 
placed by tz{<s) and t* z {<s), respectively. 

We now present an application of Lemma 5.1: 

Example 5.4. Atkinson [1] considered the linear homoscedastic model without treat- 
ment/covariate interaction in the form 

E{Yi) = 5 t /i A + (1 - 5 t ) hb + f(ziYf3, i > 1, (5.8) 

where /(•) is a known vector function and (3 is a vector of common regression param- 



eters. Put J- n 



/(zi)* , F n = [1„ : F n ] andb n = (2S n - 1„)*F„, where 1„ is the 



n-dim vector of ones, 5^ = (Si , . . . , S n ) and b l n is usually called the imbalance vector. 
Atkinson introduced his biased coin design by assigning the (n + l)st patient to A with 
probability 

Pr(5„ + i = 1 | 9f nj Z n+ i) = 

{i-(iJ(z n+1 y)(KF n yib n y (5.9) 

{1 - (1; /(z„ +1 )*)(F^F„)-ib„}2 + {1 + (1 . /( Zn+1 )t)(F* Fn )-i 6|i }2 ' 

In order to avoid cumbersome notation, without loss of generality we assume a scalar 
covariate and for simplicity we let f(z) = z. Since 



(l;z„+i)(F^F„)- 1 6 n = (l;z n+1 ) _ 



1 z n \ 1 / 2tt„ - 1 



^^nZAn Z n 



where z n = n 1 Yn=i z i> z 1 = n 1 2~2i=i z i> z An = N n 1 zZ7=l rule < 5 - 9 ) 
corresponds to 



(p ATK (7T„ ; S n , f(z n+ i 



[1 - (2a„7r„ - l)] 2 



[1 - (2a n n n - l)] 2 + [1 + (2a n7 r n - l)] 2 ' 
with a n = 1 + (z n — ~ZAn){zn — z n +i)/[zn — (z n ) 2 ]; thus, from (5.2), we obtain 

* ATK {' ■ - « z ») " ti^WW^W - (5 -'»» 

which is decreasing in x; therefore the equation tp ATK ^x ; Ez[f(Z)]j = x has a 
unique solution t* z (s) = 1/2. Since ip ATK (1/2; s,f(Z)) = 1/2, then by Lemma 5.1 



it follows that lim Il _j. 0O tt„ = 1/2 a.s. Note that, even in the case of several covari- 
ates (1; f(z n+ i) t )(F t n F n )~ 1 b n is still a linear function of ir n and (5.10) follows after 
straightforward matrix calculations. 
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Clearly, the same arguments still hold for the convergence to balance of Begg and 
Iglewicz [8] and Smith's class of procedures [24, 25]. 

6. CARA designs with categorical covariates 

We now provide a convergence result for CARA designs in the case of categorical 
covariates. In order to avoid cumbersome notation, from now on we assume without 
loss of generality two categorical covariates, i.e. Z = (T, W), with levels tj (j = 
0, . . . , J) and wi (I = 0, . . . , L), respectively. Also, let p — [pji : j = 0, . . . , J; I = 
0, . . . , L] be the joint probability distribution of the categorical covariates, with pji > 
for any j = 0, . . . , J and I = 0, . . . , L and z2j=o Ya=o Pji = l - 

After n steps, let N n (j, I) — Yl7=i ■"■{■Zi=(*-,t«:)} be the number of subjects within 
the stratum (tj, wi), N n (j, I) = ^i^-{Zi=(t-,wi)} me number of allocations to A 

within this stratum and tr n (j, I) the corresponding proportion, i.e. ir n (j,l) = N n (j, l)~ l N n (j, I), 
for any j = 0, . . . , J and I = 0, . . . , L. Also, let n n = [jr n (j, I) : j = 0, . . . , J; I = 0, . . . , L]. 

After an initial stage with m observations on each treatment, performed to derive a 
non-trivial parameter estimation, we consider a class of CARA designs that assigns the 
(n + l)st patient with covariate profile Z n+ i — (tj, wi) to A with probability 

Pr(5„+i = 1 | 3f n , Z n+ i — (tj, wi)) — tfji (7r„ ;7„, S n ) , for n > 2m, (6.1) 

where 3?„ = a(Si, . . . , 5„; Y\, . . . , Y„; Z\, . . . , Z n ) and (fiji is the allocation function 
of the stratum (tj, wi). 

Let (p(n n ;j n , S n ) = [<Pji(TT n ;7ni S n) ■ j = 0, . . . , J; I = 0, . . . , L], often the al- 
location rule at each stratum does not depend on the entire vector of allocation propor- 
tions 7r„ involving all the strata, but depends only on the current allocation proportion 
of this stratum, i.e. 

f]i(^n;i n ,S n ) = ifiji(TT n (j,l);j n ,S n ), Vj = 0, . . . , J; I = 0, . . . , L. (6.2) 

However, note that (6.2) does not correspond in general to a stratified randomization, 
due to the fact that the estimate j n usually involves the information accrued from all 
the strata up to that step, and thus the evolutions of the procedure at different strata are 
not independent. 

Definition 6.1. Let x = [x±, . . . , xjc], where x L G [0; 1] for any i = 1, . . . ,/C and 
JC is a positive integer. Also, let ^ t (x; y) : [0; 1]^ x R d — > [0; 1] and set i/>(x; y) = 

& (x; y), . . . , i> K (x; y )] . Then t (y) = [t i (y) , . . . , t K (y )}, with t L (y ) : R d -> [0; 1] 

for l = 1, . . . , tC, is called a vectorial generalized downcrossing ofip if for all y £ R d 
and for any t, = 1, . . . , K 

forallx, <t L (y), $ t (x; y) > t L (y) and for all x L > t L (y), ^(x; y) < t t (y). 

Clearly, if the function Vv( x ; y) is decreasing in x (i.e. componentwise) for any t, 
then the vectorial generalized downcrossing t(y) is unique, with t(y) S (0; 1) K for any 
y £ R d ; furthermore ip(t(y); y) = t(y), provided that the solution exists. Moreover, 
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note that if Vv(x; y) = i/Ji,(x L ; y) for any t = 1, . . . ,/C, then each component f t (y) of 
t(y) is simply the single generalized downcrossing of ^j l (x l ; y), which can be found 
by solving the equation 4>^(x; y) = x (if the solution exists). 

Theorem 6.1. At each step n, suppose that for any given stratum (tj, u>i) the alloca- 
tion function ifji (7T„ ;^ n ,S n ) is decreasing in 7T„ (componentwise) . If the unique 
vectorial generalized downcrossing t (~f n , S n ) = [tji('y n ,S n ) : j = 0,...,J;l = 
0, . . . , L] is a continuous function and ip(t (7, s) ; 7, = t (7, <;), then 

J L 

lim 7r n = t (7, s) an^ lim 7r n = £? z [* (7, ?)] = ^(7, 

n— >oo n— >oo z — ' L — » 

Proof. See Appendix A.3. □ 

Example 6.1. The Reinforced Doubly-adaptive Biased Coin Design (RDBCD) is a 
class of CARA procedures recently introduced by Baldi Antognini and Zagoraiou [6 ] in 
the case of categorical covariates intended to target any desired allocation proportion 

n*h) = [7r*(j,l):j = 0,...,J;l = 0,...,L]-.n^ (0, 1) ( J+1 ) x , 

which is a continuous function of the unknown model parameters. Starting with a pilot 
stage performed to derive an initial parameter estimation, at each step n > 2m let 
T^nih b e me estimate of the target within stratum (tj,Wi) obtained using all the 
collected data up to that step and pji n = n N n (j, I) the estimate of pji; when the 
next patient with covariate Z n +\ = (tj,Wi) is ready to be randomized, the RDBCD 
assigns him/her to A with probability 

Pr(<5„+i = 1 I 3„,Z n+ i = (tj,Wi)) = ifiji (Tr n (j,iy,Tr*(j,l),pjin) , 

where the function tpji(x; y, z) : (0, l) 3 — > [0, 1] satisfies the following conditions: 

i) is decreasing in x and increasing in y, for any z £ (0,1); 

ii) ifji (x; x, z) = x for any z <E (0,1); 

Hi) fji is decreasing in z if x < y, and increasing in z if x > y; 
iv) (pji(x;y,z) = 1 - tpji(l -x; 1 - y,z)forany z <E (0, 1). 

Firstly observe that for the RDBCD (6.2) holds and thus, from i) andii), at each stratum 
{tj, w{) the only generalized downcrossing of ifji is simply given by 7?^ (J, I). There- 
fore, by Theorem 6.1, lim„_s. 0O 7r„(j, I) = 7r*(j, I) a.s. for any j = 0, . . . , J and 
I = 0, . . . , L, due to the continuity of the target, i.e. limn^oo 7T„ — 7T*(7) a.s. 

6.1. Covariate-Adaptive designs with categorical covariates 

Theorem 6.1 can be naturally applied to CA procedures in the case of categorical co- 
variates by assuming, instead of (6.1), the following class of allocation rules: 

Pr(<5„ + i = 1 I 3„, Z n+ i = z n+ i) = ipji (tt„ ; S n ) , (6.3) 

where now 3„ = a(6i, . . . , 5 n ; Zx, . . . , Z n ). Moreover, from now on we let t B = 
[l/2:j = 0,...,J;l = 0,...,L]. 
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Example 6.2. The Covariate- Adaptive Biased Coin Design ( C-ABCD) [5] is a class of 
stratified randomization procedures intended to achieve joint balance. For any stratum 
(tj,wi), let Fji(-) : R — > [0,1] be a non-increasing and symmetric function with 
Fji(-x) = 1 — Fji (x); the C-ABCD assigns the (n + l)st patient with profile Z n+ i = 
(tj, wi) to A with probability 

Pr (6 n+1 = 1 | _=„, Z n+1 = (tj,wi)) = F fl [D n {j, 1% (6.4) 

where D n (j, I) — N n (j, I) [27r„(j, I) — 1] is the imbalance between the two groups af- 
ter n steps within stratum (tj, wi). As showed in Remark 3.2 and Example 3.4 in the 
case of AA procedures, Theorem 6.1 still holds even if we assume different random- 
ization functions at each step, provided that the unique vectorial generalized down- 
crossing is the same for any n. Indeed, it is trivial to see that rule (6.4) corresponds 
to 

tpjln (7T„ ; S n ) = fjln (^n(jj) l S n ) = F jt {n [2lT n (j,l) - l]p 3 l n } , 

and, from the properties of Fji, ifjin's have 1/2 as unique downcrossing for any n; 
thus lim.n_j.oo 7T„ = t B , which clearly implies marginal balance. 

Moreover, when the covariate distribution is known Baldi Antognini and Zagoraiou 
[5] suggested the following class of randomization rules: 

F?i(x) = {x^ + 1}"\ x>l, 

where q(-) is a decreasing function with lim t _ > o+ = °°- Clearly, the above men- 
tioned arguments and Theorem 6.1 guarantee the convergence to balance even if the 
covariate distribution is unknown, by replacing at each step pji with its current esti- 
mate. 

Examples 6.1 and 6.2 deal with procedures such that, at every step n, the allocation 
rule ifji depends only on the current allocation proportion ir n (j, I), namely satisfying 
(6.2). We now present additional examples where <pjt is a function of the whole vecto- 
rial allocation proportion tv u . 

Example 6.3. Minimization methods [19, 27] are stratified randomization procedures 
intended to achieve the so-called marginal balance among covariates. In general, they 
depend on the definition of a measure of overall imbalance among the assignments 
which summarizes tlie imbalances between the treatment groups for each level of ev- 
ery factor. Assuming the well-known variance method proposed by Pocock and Simon 
(1975), the (n + l)st subject with covariate profile Z n+ i = (tj,w{) is assigned to 
treatment A with probability 

(p D n (tj) +D n (wi) <0 
Pr((S„+i = 1 | 3„,Z n+ i = (t h wi)) = I | D n (tj) +D n (wi) = 0, (6.5) 

[l-p D n (t 3 )+D n (wi)>Q 

where p G [1/2; 1], D n (tj) is the imbalance between the two arms within the level tj 
ofT and, similarly, D n (wi) represents the imbalance at the category Wi ofW. At each 



A. Baldi Antognini and M. Zagoraiou/Convergence of adaptive allocation procedures 16 

step n, note that sgn{D n (tj)} = sgn{n~ 1 D n (tj)} where 

L 

n~ 1 D n (t j ) = ^ l 27T n(j, I) - l]Pjln, for any j = 0,..., J (6.6) 
and analogously for D n {w{). Thus, allocation rule (6.5) corresponds to 

{P Ef=0 [ n n(j, I) ~ \] Pjln + 2~2j=0 \?n{j, - \] Pjln < 

5 T,l=0 [ n n(j, - |] Pjln + Ej=0 [Knti, I) - \] Pjln = 0, 

1 - P E/=0 [ n n(j, - \] Pjln + E/=0 [Knti, I) ~ \] Pjln > 

and therefore the problem consists in finding the vectorial generalized downcrossing 
of ip PS (7r n ; S n ) = YP^jfi^n, S n ) '■ j = 0, . . . , J; I = 0, . . . , L]. Since at each step 
n, (TT n ; S n ) is decreasing in ir n (j, I) for any j = 0, . . . , J and I = 0, . . . , L, 
then the vectorial generalized downcrossing is unique. It is straightforward to see that 
ip PS (t B ; <;) = t B for every n and thus lim„_ ! . 00 7r„ = t B a.s. 

Example 6.4. In order to include minimization methods and stratified randomization 
procedures in a unique framework, Hu andHu (2012) have recently suggested to assign 
subject (n + 1) belonging to the stratum (tj, wi) to A with probability 

(p D n (j,l)<0 
Pr(6 n+1 = 1 | 3„, Z n+1 = (tj,wi)) = I i D n (j, 0=0, (6.7) 

[l P D n (j,l)>0 

where the overall measure of imbalance 

D n (j> = ^gDn + u T D n (tj) + u w D n (wi) + uj s D n (j, I) 

is a weighted average of the three types of imbalances actually observed (global, 
marginal and within- stratum ), with non-negative weights uj g ( global ), lot and ujw (co- 
variate marginal) and lu s (stratum) chosen such that uj g + lut + % + U3 S = 1. 
By choosing the weights UJ g , lot, % such that 

(JL + J + L)ujg + Ju) W + Luj t < 1/2, (6.8) 

the authors proved that the probabilistic structure of the within stratum imbalance is 
that of a positive recurrent Markov chain and this implies that procedure ( 6. 7) is asymp- 
totically balanced, both marginally and jointly. However, as stated by the authors, only 
strictly positive choices of the stratum weight U) s satisfy (6.8), and thus their result 
cannot be applied to Pocock and Simon 's minimization method. 

The asymptotic behaviour of Hu and Hu's design can be illustrated in a different 
way by applying Theorem 6.1. Since sgn{D n (j, I)} = sgn{n~ 1 D n (j, I)} and 

J L 

n~ l D n = 2rr n -1 = J2H [ 27r " (j '' Z) ~ 1] (6.9) 

j=0 1=0 
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from (6.6) it follows that 

J L 

sgn{n~ 1 D n {j, I)} = sgn{uj g ^ ^ 

j=0 1=0 



L 
1=0 



7Tn(i)0 - 2 



J r 

Pjln + U\V 

3=0 



TnC7)0 - 2 



t™C7>0 - 2 



Pj7n}- 



Thus, at each step n procedure (6.7) corresponds to an allocation rule <f BH {it n ; S n ) 
which is decreasing in ir n (j, I) for any j = 0, . . . , J and I = 0, . . . , L. Since (f HH (t B ; s) 
t B , then the unique vectorial generalized downcrossing is t B for any n and therefore 
lim^oo 7r„ = t B a.s. 

Under the same arguments it can be easily proved the convergence to balance of 
several extensions of minimization methods (see e.g. [12, 23]), since at each step n 
every type of imbalance (global, marginal and within-stratum) is a linear combination 
of the allocation proportions ir n (j, l)'s. 

Example 6.5. Assuming model ( 5.8) with all the interactions effects among covariates, 
then 

bi = (D n , D n {t x ), D n (tj), D n ( Wl ), D n (w L ), D n (l, 1), . . . , D n (J, L)) 

and, as showed in [5], Atkinson's procedure (5.9) becomes a stratified randomization 
rule with 



Pr(A 



n+l 



1 i Z 



n i n+l 



1 



N n (j,l) 



Clearly, rule (6.10) corresponds to 

<Pji (Tn j Sn) = 



(6.10) 



[1-nnU, I)] +*nU, 5 



so (6.2) holds; thus, by Theorem 6.1, lim Il _s. 0O 7r„ = t B . 

When the model is not full, then b n contains all the imbalance terms corresponding to 
the included interactions. Thus, from (6.6) and (6.9), (1; f(z n+ i) t )(¥ t n F n )~ 1 b n is a 
linear function of the allocation proportion tt„, so that Theorem 6.1 can be applied by 
the previous arguments. 



Appendix A 

A.l. Proof of Theorem 4.1 

At each step n, consider the squared integrable martingale process {M n ; Sy„}, where 
M n = E?=i AM > = E?=i & ~ ^(^l^-i)} and 9f„ = a(6 u . . . , <5„; Y u ..., Y n ). 
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Let A n = max {s : 2m + 1 < s < n, tt s < i(7 s )}, with max0 = 2m. Thus at each 
step i > A„, ip RA (m ; y 4 ) < i(7i) and therefore 

n n 

N n =N Xn+1 + AM *+ E V RA {-"k-Ulk-i) 

n 

<N Xn + l + M n -M Xn+1 + *(7 fc -i)- 

fc=A„+2 

Since N\ n < X n t (t\ ) we obtain 

N n - nt (7„) < (\ n t (7 A J - t fa-i) J + M « - M A„+i + 1 - * (7o) 

- fn*(7n)-£*(7fc-i)l > 

where t (7 ) = t G [0; 1] is a constant depending on the initial stage. Furthermore, as 
n — > 00, at least one of the the number of assignments to the treatments, namely N n 
and (n — N n ), tends to infinity a.s. As showed in [17], in any case 7„ has finite limit 
so that, from the properties of t (7 n ), there exists at'G (0, 1) such that 

t( Cln)^v a.s. (A.l) 

and so lirrin^oo t (-y n ) — n" 1 Ylk=i * wfc-i) = a.s. As n — > 00, then A n — > 00 or 
sup„ A„ < 00; in either case, lim^oo rT 1 \ n t (j K ) - A." 1 J2k=i 1 (7fc) = a.s. 
and therefore 

k™-<(7„)] + ^ a.s. (A.2) 

Analogously, 

[(l-^„)-(l-*(7„))] + ^0 a . s . (A.3) 

From (A.2) and (A.3), as n tends to infinity Tr n — t(j n ) — > a.s. and by (A.l) 
lirrin^oo 7r„ = liirin^oo i(7„) = w a.s. Since < w < 1, then < 1 — v < 1 and thus 
lim^oo N n -» 00 a.s. and lim^oo (n - iV n ) -> 00 a.s. Therefore, rim n _ ! . 00 7 rl — ^ 7 
a.s. and from the continuity of the downcrossing lim^oo i(7„) = £(7) = v a.s., i.e. 

limn^oo 7T„ = t(j) a.s. 



A.2. Proof of Theorem 5.1 

If ip CARA is decreasing in 7r„ then (pz is also decreasing in 7T n , so that the generalized 
downcrossing is unique and lies in (0; 1). Letting now 3„ = a(6i, . . . , S n ; Y\, . . . , Y n ; Z 
thenE(5 i \%- 1 )=E Zi [<p (tt^i ;tVi, S^, /(Z*))] and AM* = ^-E^I^.O. 
Then {AM^; i > 1} is a sequence of bounded martingale differences with | AMj| < 1 
foranyi > l;thus{A/„ = £)? =1 AM,; 3„} is amartingale with YZ=i £ , [(AAf l ) 2 |3 fe _ 
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n. Let Q n = max {t9 : 2m + 1 < & < n, n$ < tz('J^, <S#)}, with max0 = 2m. So 
that V i > Cn we have <f>z {^i ', 7i, Si) < ?z(7i, Si). Note that 

n n 

N n =N Cn+1 + J2 AM k + E ( S k\^k-i) 

k=Cn+2 fc=C„+2 
n 

<N Cn + 1 + M n - M (n+ i + Vzfa-itfh-nSk-i) 

n 

<N Cn + l + M n -M Cn+1 + tz(lk-i,S k -i) 

n Cn + 1 

=N (n + 1 + M n - M (n+1 + ^ i* (7 fe _ x , 5 fc _i) - ^ t 2 (7 fe _i, Sfc_i) , 

k=l k=l 

Since A^ n < Cn*.z(7f n ! <^C„)» tnen 

N n -nt z (%,S n ) < ((Jz(% n ,S u ) - 5^ fz (%-i,S k -i)j + 

M n - M c „ +1 + l-t z (%, S ) - (nt z (%, S n ) - J2 tz (%-i,S k -i)j ■ 

Moreover, as n — > oo, at least one of the the number of assignments to the treatments, 
namely N n and (n — N n ), tends to infinity a.s. In any case from the properties of 
tz (7 n , S n ), there exists a v £ (0,1) such that 

tz (%, S n ) ->■ v a.s. (A.4) 

and so 

1 " 

t z (j n ,S„) Viz (j^Sk-i) ^0 a.s. 

k=l 

As n — > oo, then £„ — > oo or sup n £„ < oo; in either case, 

a.s. 



^ |tz(7 C „ ^C„)-^E ** ^ Sk)\ 



and therefore 

[^n-t z (ln,S n )} + ->• a.s. (A.5) 

Analogously, 

[(l-^)-(l-tz(7„,^n))] + ^0 a.s. (A.6) 

From (A.5) and (A.6), lim„_ i . 00 n n — tz (7„, S n ) = a.s. and therefore by (A.4) 
lim^oo 7r„ = lim„^oo tz (7„, S n ) = v a.s. Since < v < 1, then < 1 — 
< 1 and lim„_ i . 00 N n — > oo a.s. and lim„_^ 00 (n — iV„) — > oo a.s. Therefore, 
lirrin^oo 7 n — > 7 a.s. and also lim^^oo S n —> <; a.s., so that from the continuity of 
the downcrossing lim^oo t z (j n , S n ) = iz(j, s) = v a.s., namely lim„^oo ir n = 
iz(l,s) as- 
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A.3. Proof of Theorem 6.1 

At each step n, let M n (j,l) = £Li AM^,/) = £?=i {<5„ - £(<5„ | ®n-i)}l{^ B =(t i ,«,)}, 
where ©„ = <r(3„, Z n +\), Therefore, at each stratum (tj,wi), {AMi(j, l);i > 1} is 
a sequence of bounded martingale differences with \AMi(j, l)\ < 1 for any i > 1 and 
thus, {M n {j, I); & n } is a squared integrable martingale with J2k=i ^[(AMj(j, Z)) 2 
0fc-i] < n. 

Let /) = max{s : 2m + 1 < i < n, 7r.;(j, /) < tji('j i , Si)}, with max0 = 
2m, then there exists a given stratum [ty , u>/') such that £ n (j', V) = maxj-j ^ n (j, /). 
Therefore, for any i > £ n (j'> V\ at eacn stratum iTi(j, I) > tji and, by Definition 6.1, 
<Pjl (tt; 5i) < Thus 

n n 

iv„(i / ,o = ^„ (J - wl (i / ,o+ E AM(/,n+ E fifti«i-i)»{%=d,,*)) 

i=€„0V)+2 i=6,(j',;')+2 

n 

<Nu(j',i')0'> 1 ') + 1 + M n(f,l') ~ M 5„0',/')+i0''^') + E ('fi-iiVnS'i-i) l{z,=(t y 

»=e»(j',«')+2 

n 

<%>CjM')0V) + 1 + M n (j',0 - M u{j ,, v)+1 {j',l') + E *i'i'(7i_i,^i-i)l{^=(v,» I ,)} 

i=e»(j',«')+2 
n 

=^C„(j',J')0". + 1 + M n (j', - Af fBCj v, I0+1 (j', I') + E Vi'(7,-i, Si-i)l{z 1= ( V W )} 

i=l 

- E *i'«'(7i-i)' sr j-i) :11 -{z i =(v,w [ ,)}- 

i=l 

Moreover, since N^^^f, I') < N u{j ^ v) {j', l')t ftl (j (n{f u} , S^ nU , j, } ), then 
N n (f, I') - N n (f, l%n>(%, S n ) < M n (f, I') - M Sn(j , w (j', V) + 1 

i=l 

- f jv B (j',nvi'(7„,s„) -Svi'^i-i.^i-oi^Kv,™,/)}! • 

Since > 0, then as n — > oo 

N n (j,l)-> oo and M " |r -»0 a.s. V? = 0, . . . , J; I = 0, . . . , L. 

Moreover, as n —> oo at least one of N n (j' ', I') and [N n (j' , V) — N n (j' , /')] tends to 
infinity a.s. Therefore 1 y n and S n have finite limits and thus, as n —> oo 

tj'l'hn'Sn) ■„ >0 (Z.S. 

2^=1 1 {2 s = (iy,t«,,)} 
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Furthermore, asmoo 

+ (~ c N N inU ,. v) (f,l') £ti ,/)+1 V i -(7-i,S i -i)l{^(v,„,)} n 

and therefore lim„->oo [^(j', i') - tj'l'(l n , S n )] + = a.s. 

Analogously, lim^oo {[1 - 7r„(j', /')] - I 1 - ^'i'(7„, SVi)]} + = a.s. and thus 



lim 7T n (/,£') = if s) a.s. (A.7) 

n— »oo 



Since 3! t (fy n , S n ) = [tji(j n , S n ) : j — 0, . . . , J; I — 0, . . . , L] which is continuous 
and cp(t (7, ; 7, = t (7, <;), then from (A.7) follows that 

lim 7r n (j, = £,7(7, a.s. for every (j, i) ^ (j', Z') 
and Theorem 6. 1 follows directly. 
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